Training the Hidden Vector State Model from Un-annotated Corpus

نویسندگان

  • Deyu Zhou
  • Yulan He
  • Chee Keong Kwoh
چکیده

Since most knowledge about protein-protein interactions still hides in biological publications, there is an increasing focus on automatically extracting information from the vast amount of biological literature. Existing approaches can be broadly categorized as rule-based or statistically-based. Rule-based approaches require heavy manual effort. On the other hand, statistically-based approaches require large-scale, richly annotated corpora in order to reliably estimate model parameters. This is normally difficult to obtain in practical applications. We have proposed a hidden vector state (HVS) model for protein-protein interactions extraction. The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. State transitions are factored into a stack shift operation similar to those of a push-down automaton followed by the push of a new preterminal category label. In this paper, we propose a novel approach based on the k-nearest-neighbors classifier to automatically train the HVS model from un-annotated data. Experimental results show the improved performance over the baseline system with the HVS model trained from a small amount of the annotated data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Generative/Discriminative Framework to Train a Semantic Parser from an Un-annotated Corpus

We propose a hybrid generative/discriminative framework for semantic parsing which combines the hidden vector state (HVS) model and the hidden Markov support vector machines (HMSVMs). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. The HM-SVMs combine the advantages of the hidden Markov models and the support vector ...

متن کامل

Semi-supervised learning of the hidden vector state model for extracting protein-protein interactions

OBJECTIVE The hidden vector state (HVS) model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. It has been applied successfully for protein-protein interactions extraction. However, the HVS model, being a statistically based approach, requires large-scale annotated corpora in order to reliably estimate model parameters. This is nor...

متن کامل

A semi-supervised learning framework for biomedical event extraction based on hidden topics

OBJECTIVES Scientists have devoted decades of efforts to understanding the interaction between proteins or RNA production. The information might empower the current knowledge on drug reactions or the development of certain diseases. Nevertheless, due to the lack of explicit structure, literature in life science, one of the most important sources of this information, prevents computer-based syst...

متن کامل

Semantic processing using the Hidden Vector State model

This paper discusses semantic processing using the Hidden Vector State (HVS) model. The HVS model extends the basic discrete Markov model by encoding context in each state as a vector. State transitions are then factored into a stack shift operation similar to those of a push-down automaton followed by a push of a new preterminal semantic category label. The key feature of the model is that it ...

متن کامل

Name Tagging with Word Clusters and Discriminative Training

We present a technique for augmenting annotated training data with hierarchical word clusters that are automatically derived from a large unannotated corpus. Cluster membership is encoded in features that are incorporated in a discriminatively trained tagging model. Active learning is used to select training examples. We evaluate the technique for named-entity tagging. Compared with a state-of-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007